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We propose a network-filtering method, the Triangulated Maximally Filtered Graph (TMFG), 
that provides an approximate solution to the Weighted Maximal Planar Graph problem. The 
underlying idea of TMFG consists in building a triangulation that maximizes a score function 
associated with the amount of information retained by the network. TMFG uses as weights any 
arbitrary similarity measure to arrange data into a meaningful network structure that can be used 
for clustering, community detection and modeling. The method is fast, adaptable and scalable to 
very large datasets, it allows online updating and learning as new data can be inserted and deleted 
with combinations of local and non-local moves. TMFG permits readjustments of the network in 
consequence of changes in the strength of the similarity measure. The method is based on local 
topological moves and can therefore take advantage of parallel and GPUs computing. We discuss 
how this network-filtering method can be used intuitively and efficiently for big data studies and 
its significance from an information-theoretic perspective. 
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I. INTRODUCTION 

We are witnessing interesting times rich of information, readily available for us all. Using, 
understanding and filtering such information has become a major activity across science, industry 
and society at large. Our society has become a global information processing system where 
news propagate and impact on individuals and the economy at increasingly fast rates with 
increasingly large effects. It is therefore important to have tools that can analyse this information 
while it is generated and that can provide ways to reduce complexity and dimensionality while 
keeping the integrity of the dataset. Information content and flow are often associated with large 
degrees of redundancy both in time (repeating and scaling patterns) and across different variables 
(similarity, dependency and causality). Redundancy is often used to convey strength to the 
meaning or, more simply, it is the signal of recurring patterns with high statistical significance and 
therefore important. In this paper we propose to use such redundancy to build an information- 
based network that retains the relevant part of the data-interdependency structure. The structure 
of this network is a representation of the information in the dataset and such information can 
be efficiently analysed by using network-theoretic tools. 

The idea of using redundancy - mostly correlation coefficients - to filter information in complex 
datasets by building sparse networks retaining relevant edges only has been very actively studied 
in the literature mostly by means of two approaches: i) the minimum spanning tree (MST) 
mm-, ii) the planar maximally filtered graph (PMFG) [SHS]. The common idea underneath 
these two approaches is to filter a dense matrix of weights by retaining the largest and most 
significant possible subgraph while imposing global constraints on the topology of the resulting 
network. In particular, in the MST approach edges with the largest weights (e.g. correlations) 
are retained while constraining the subgraph to be globally a (spanning) tree. Similarly, in 
the PMFG construction the largest weights (e.g. largest correlation coefficients) are retained 
while constraining the subgraph to be globally a planar graph (see [2H1])- Both the MST and 
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the PMFG are particular cases of simplicial complexes; our proposed method can be extended 
to more general simplicial complexes with different constraints (for instance on the topological 
genus, or on the size of the largest complete subgraph - or clique). 

The PMFG is a greedy solution of the Weighted Maximum Planar Graph (WMPG) 
problem: given a complete edge-weighted graph find a planar subgraph which is maximal (i.e. 
no edge can be added without destroying planarity) and such that the sum of the edge weights is 
maximum. The problem is known to be NP-complete (see [B] for a proof and [7] for a review), but 
algorithms providing sub-optimal solutions are known. For instance, an approximation algorithm 
with a guaranteed performance ratio of | for complete graphs is discussed in [5]. 

The PMFG has a richer information content than the MST with a larger number of edges (the 
PMFG has 3p —6 edges, while the MST has p — 1, where p is the number of vertices) and contains 
of 3- and 4-cliques. However, the network is still sparse, filtering 3p — 6 edges out of p{p — l)/2 
of the complete graph Kp which is associated with the original dense matrix of weights. 

Planar filtered graphs are powerful tools to study complex datasets. It has been shown in [9] 
that by making use of the 3-clique structure of the PMFG a clustering structure can be extracted 
allowing dimensionality reduction that keeps both local information and global hierarchy in a 
deterministic manner without the use of any prior information. Applications of Planar filtered 
graphs to financial data-sets can meaningfully identify industrial activities and structural market 
changes mm- Planar filtered graphs can be used to diversify financial risk by building a well- 
diversified portfolio that effectively reduces risk by investing in stocks that occupy peripheral 
regions of the graph |12j . Planarity ensures easy visualization of the network with the possibil¬ 
ity to draw the network without edge-crossing. Another appealing advantage of planar filtered 
networks concerns graphical modeling (e.g. Markov Random Fields m) where planarity (which 
limits the treewidth of the junction tree of the filtered graph) grants that some exact inference 
algorithms can be performed in an efficient fashion (see mm)- However, the algorithm so 
far proposed to construct the PMFG is computationally costly and cannot be applied to large 
datasets. There is therefore scope to search for novel algorithms that can construct planar filtered 
graphs in a computationally efficient way. In the present paper we indeed introduce a compu¬ 
tationally efficient algorithm, the TMFG, that produces planar filtered graphs by optimizing an 
objective function (which we shall call “score function”) by using local topological moves called 
Ti and T 2 [TB], the ‘Alexander move’ A [T7j, in conjunction with a ‘vertex-swap’ operator S. 

The TMFG algorithm has also the advantage of allowing ‘online’ updates of the planar graphs. 
Furthermore, the TMFG can be naturally applied to multipoint dependency measures associated 
with the 3- and 4-clique structure. When only T 2 and S operators are used the algorithm 
produces triangulated (or chordal) graphs together with their structure of cliques and separators. 
Chordal graphs have appealing properties: on the one side they are very well suited to modelling 
with Markov Random Fields (MRF) since they have a closed-form solution for the Maximum 
Likelihood Estimate (see [H US]) of the joint probability, on the other side chordal graphs 
are perfect graphs and as such are known to have polynomial time solutions for problems that 
are otherwise harder (e.g. graph coloring problem, maximum clique problem, and maximum 
independent set problem, see [SS])- Graphical models with a chordal underlying graph have a 
particularly useful representation of the joint probability distribution and a closed form solution 
for the Maximum Likelihood Estimate (MLE). In the case of Gaussian graphical models the 
structure of the graph represents partial correlations between variables and the non-zero entries of 
the concentration matrix (the inverse of the covariance matrix also called precision matrix |18ll21l 
\n\ ] coincide with the edges of the graph. Additionally, the algorithm has the advantage that it 
is not restricted to planar topologies allowing higher-genus hyperbolic embeddings to be explored 
[IB1[23H25|. Finally, given its local nature, the algorithm is ideally suited for parallelisation and 
GPU computing. 

This paper is organized as follows. In section we discuss some facts about Planar Maxi¬ 
mally Filtered Graphs describing two algorithms used to generate such graphs; in section HI we 
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introduce the TMFG algorithm and we highlight some characteristics of the algorithm that are 
relevant to applications with high-dimensional, frequently-updated datasets; in section [rv| we of¬ 
fer some information-theoretic perspectives on the selection of a particular score function; finally 
in section |V] we apply TMFG to several weight distributions showing that it is computationally 
faster than PMFG achieving comparable or better results. 


II. PLANAR MAXIMALLY FILTERED GRAPHS 

Algorithms for the extraction of planar subgraphs from dense networks are relevant in several 
domains such as, for instance: i) the analysis of financial data - where nodes generally represent 
financial assets (such as stock prices, spreads, liabilities, risk or liquidity indicators etc.) and the 
edges represent correlations or other measures of dependence between them (see [3101 [TH US] ); h) 
facilities layout - where nodes represent the facilities and the edges the affinities between them 
(see [37H33] for a survey); iii) integrated circuit design - where nodes are the electrical elements 
and connections are the physical connections (see IdOj): iv) systems biology - where nodes can 
represent proteins and edges protein interactions in a metabolic network (see m) ; v) social 
systems - where nodes represent social agents (e.g. individuals, companies, groups) and edges 
represent social interaction (see (33] for a detailed overview). In some domains - such as facility 
layout or integrated circuit design - the constraint of planarity is a direct consequence of the two 
dimensional planar geometry of the problem, while in other domains, network planarity is a way 
to constraint the complexity of the graph reducing the degree of interwovenness |3]. Planarity 
is also desirable because many NP-hard problems have efficient polynomial-time solutions, or 
better approximations, for planar graphs (vertex coloring, edge coloring, independent vertex set, 
multicommodity flows, see [33] for an introduction). 

Let us here briefly review two known algorithms that have been used to construct planar 
filtered networks from a matrix of weights: 

• PMFG lU; 

• Deltahedron heuristics and subsequent improvement [7105103] . 

These algorithms provide estimates for optimal solutions to the Maximum Weighted Planar 
Graph problem. Let us here recall that, given a complete edge-weighted graph G{V,E), with 
vertex set V and edge set E, the Maximum Weighted Planar Graph problem requires to 
build a planar subgraph G'{V,E'), with E' C E such that adding another edge e £ E\E' would 
cause G(V, E' U e) to be non planar and such that the sum of the weights is maximum (see 03] 
and [7] for a detailed description of the problems and a survey). 


A. Planar Maximally Filtered Graph 

The PMFG algorithm [5| searches for the maximum weighted planar subgraph by adding edges 
one by one (see 0). The resulting matrix is sparse, with 3{p — 2) edges. The algorithm starts by 
sorting by weighting all the edges of a dense matrix of weights in non increasing order and tries 
to insert every edge in the PMFG in that order. Edges that violate the planarity constraint are 
discarded. The most computationally intense part of the algorithm is the planarity test, which 
is performed every time an edge insertion is attempted. It results that the PMFG construction 
performs an order of {0{p^)) planarity tests on any dense pxp matrix of weights W. Assuming 
that the complexity for a planarity test in 0(p) (see [MIES]) the computational complexity of 
the whole algorithm results in a 0{p^) [9|. 
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FIG. 1: T 2 move: addition of one vertex within a triangular face |16[ I2.4[ [2^ 1381139| . Its inverse, Tj 
removes a vertex from inside a three-clique (in this case the clique {ui,U 2 , Va})- 


B. Deltahedron heuristic 


The deltahedron heuristic [281 El] searches for approximate solutions of the WMPG problem 
starting from a tetrahedron, K 4 ^, which is planar. Then, at each step a vertex is added into a 
triangular face and three edges are added connecting the newly inserted vertex to the vertices 
of the triangular face. This vertex insertion in a triangular face is called T 2 move (see Fig0 
and [m |23l |2ll EHl El]). It is easy to see that the T 2 operator acts without breaking planarity 
ensuring that the final network is planar. The triangular face is chosen in order to maximise 
the sum of the newly connected edges, while the vertices to be inserted are extracted from a 
pre-sorted list. The vertex list can be sorted according to two functions of the edge weights 
incident to the vertex, yielding two possible variants of the deltahedron heuristic:(i) the sum of 
the incident edge-weights or (ii) the maximum incident edge-weight. Different weightings lead 
to different ordering for the vertices and different results. 

The deltahedron heuristic algorithm is not “greedy”, for edge-insertion protocol, since the 
choice of the ordering of the vertices is done once at the beginning and there is no subsequent 
attempt at optimising the order of the vertices taking into account the local configuration and 
there is no known performance guarantee. However the algorithm is considerably faster than the 
PMFG, since every T 2 move keeps the planarity of the graph and therefore there is no need to 
test for planarity at each stage. 

An important feature of the graphs produced by T 2 moves is that they are chordal graphs: 
every cycle of length greater than 4 has a chord, an edge not belonging to the cycle that joins 
two non-adjacent vertices. Chordal graphs are perfect graphs and as such there are polynomial 
time algorithms for solving generally hard problems such as finding a maximum clique, graph 
coloring, and maximum independent set. 

In [3110] the deltahedron heuristic is improved by keeping a data structure of the most effective 
ways of inserting a vertex inside the faces (Green and Al-Hakim heuristic - the GH-heuristic 
henceforth), essentially keeping a cache of the best and next-to-best options for inserting any of 
the remaining vertices. The cache is updated as the algorithm progresses. Optionally Osman et 
al. [3 allow for a parameter that governs the “greediness” of the algorithm. In section III we will 
introduce a modified version of the GH-heuristic algorithm. 
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FIG. 3: Ti move: rewiring of a shared edge between neighboring triangular faces. 


C. Local topological moves: Ti, T 2 , A, & S 

With the deltahedron heuristic we have already introduced the T 2 move that, as shown in 
Figj^ inserts vertex V 4 into the triangular face {ui,?; 2 ,'C 3 } spliting it into three triangular faces 
{vi,V 2 ,V 4 }, {ui,U 4 ,U 3 }, and {v 4 ,V 2 ,vz\- In the following we will call face a three-clique that 
does not contain any vertex in the given embedding, reserving the word triangle for a generic 
cycle of length 3. We see that, after the T 2 move, {vi,V 2 ,V 3 } is no longer a face but rather a 
3-clique. 

In an extension of the deltahedron heuristic method, suggested by Leung |41j , vertex insertion 
can happen either one vertex at a time (the T 2 move, as in FigQ or three vertices at a time as in 
Figj^ This corresponds to the insertion of an octahedron within a triangular face [5^; clearly this 
is different from the T 2 move that instead corresponds to the insertion of a tetrahedron. However, 
such a move can be obtained by combining T 2 with another local move, called Ti [HI |23l [Ml EH] , 
consisting in switching neighbors among two adjacent triangles, as shown in Figj^ In general, 
any local topological change of a surface triangulation that preserves embedding and results in 
a triangulation can be realized through the combination of the two elementary moves Ti and 
T 2 m- However it should be pointed out that the application of Ti could cause the graph to 
become no longer chordal. For instance, the Leung extension (Figj^ can be produced via two 
T 2 and one Ti, as demonstrated in Figj^ 
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FIG. 4: Demonstration that the Leung’s extension in Figj^can be generated by using two T 2 and one 
Ti moves. 




FIG. 5: A move: insertion of a vertex inside a plaquette made of two neighbouring triangular faces. 


Another move that we will use to build planar graphs is the A move as described in Fig[^ Also 
in this case the move can be produced combining Ti and T 2 and leads to non-chordal graphs. 

Finally, we will use the ‘swap’ operator, S, that re-labels sub sets of vertices of a graph as 
shown in Fig|^ where it is acting on the vertices of a 4-simplex. This operation is trivial when 
the weights are identical, but will in general affect aggregate functions of the weights in a non¬ 
trivial way. The peculiarity of this operator is that it doesn’t have to operate locally and it keeps 
topology unchanged preserving therefore planarity. 

In the following section we shall see how T 2 , Ti, A and S moves can be used to generate planar 
filtered graphs as well as higher genus, non-planar, filtered graphs. 
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FIG. 6: S move: relabelling of the vertices of a 4-simplex. Note that the topology of the graph is 
unchanged. 


III. TRIANGULATED MAXIMALLY FILTERED GRAPH 


A. TMFG construction 


TMFG algorithm starts from a clique of order 4 (K^) and adds vertices by using the local 
move T 2 . The novelty is that, at each step, the algorithm optimizes a score function (e.g. the 
sum of the weights of the edges). Similarly to the GH-heuristics, the method does not rely 
on any particular ordering of the vertices but, at every step, it calculates the score that would 
be obtained by adding any of the remaining vertices inside any feasible face. T 2 is applied to 
the vertex and face pair that leads to the maximum increase in score. A naive implementation 
would require to evaluate the gain function for every pair consisting of a feasible vertex and a 
feasible face, thus resulting in an 0{p^) calculations at every step and therefore 0{p^) overall 
computational complexity. However, it is possible to maintain and update incrementally a cache 
with the information about the best possible pairing updating only the records affected by a 
move. Assuming the maximum requires 0{p) calculations for p vertices, the overall number of 
calculations for the score functions is O(p^). This results in much faster computational times 
with respect to the PMFG. Differently from [7] we use a slightly different data structure to 
keep track of the vertices to insert into the feasible faces. We also keep track of the triangles 
that are no longer faces because this is relevant for subsequent modelling (see section IV). 


Let us first focus on T 2 moves only. After every application of T 2 the cache is updated: some 
scores that were previously achievable are no longer feasible, while others become feasible and 
the corresponding score is calculated. More formally, we define a score function S{vh, {uq, U{,, Uc}) 
that quantifies the gain achievable by adding vertex Vh inside the triangle {va,Vb,Vc}- 

For instance, for a given, dense, matrix of weights IF, the gain function can be the sum of the 
weights of the edges that will be added by inserting vt in face {va,Vb,Vc}- S{vh, {va,Vb,Vc}) = 
W{vh,Va) + W(vh,Vb) + W{vh,Vc). In the next session we discuss an information theoretic 
interpretation of the score function. 

The cache is a structure made up of two vectors {MaxGain and BestVertex) indexed by 
the faces in the planar graph present up to that point. Let us consider a given stage of the 
construction with m triangular faces U, i € {1,2, ••• ,m} and k remaining uninserted vertices 
V S jui • ■ ■ Vk}- The MaxGain vector contains the value of the maximum gain over all remaining 
vertices for all triangular faces: 


max S{v,ti), max S{v,t2),--- max S{v,tr, 


MaxGain = 


(1) 








The BestVertex vector contains inside the list of vertices that attains the maximum gain for 
the specific triangular face: 


BestVertex = argmax S{v,ti), argmax 5'(u,^ 2 ), ■ ■ ■, argmax S{v,tm) (2) 

\ ve{vi---vk} ve{vi- -vk} 1 


When a vertex (say vertex vn) is added to a certain triangular face (say face tj) the two cache 
vectors must be updated by removing vertex Vh from the list of remaining vertices, removing 
face tj and adding three new faces. It is worth noting that B becomes a clique separator of the 
graph The TMFG pseudocode is shown in Algorithmic For simplicity we have not given 
details of the application of the moves Ti, A and the swap operator S. The TMFG generated 
by using T 2 only is a 4-clique tree. 

input : A dense p x p square matrix W with positive weights (e.g. a matrix of squared correlation 
coefficients) 

output: A sparse matrix, "P, a filtered version of W fulfilling the planarity constraint 

1 Cl Tetrahedron, {ui, i> 2 , us, U 4 }, with highest overall total score ; 

// Assign the four triangular faces in Ci to the array B 

2 T ^ {{tl, W 2 ,U 3 } , {ui,U2,U4} , {ui,U3,U4} , {V 2 , V 3 , V 4 }} 

II Put the p —4 vertices not belonging to Ci in the array V 

3 Vi {u5, • ■ • , Up}; 

// Create an empty list of Separators 

4 5 0 ; 

// Assign the first tetrahedron to the list of cliques 

5 C i — Cl; 

6 P^lV(Ci,Ci); 

7 Calculate MaxGain for T and V as in Eq.(l) ; 

8 Calculate BestVertex for T and V as in Eq. (2) ; 

// Insert p —4 vertices via T 2 

9 while V is not empty do 

10 I Find the ti £T and Ui € V that achieve the maximum in MaxGain- 

11 Insert Vi into // this creates three new triangles ta , tb , tc 

12 V ^ V \ Ui; 

13 T {T \ {tj}) U { ta , tb , tc }', 

14 5i {fi}i 

15 5 5 U 5i; 

16 Cl ^ {ti , ta , tb , tc { , 

17 C ^ C UCi; 

18 \v^V + W{Ci,Ci)-W{Si,Si)-, 

19 Update MaxGain and BestVertex to reflect the changes in T and V; 

20 eud 

21 returu V', 

Algorithm 1: TMFG algorithm 

The TMFG algorithm can be extended to include Ti and A moves as well. In this case, 
the moves are local, internal to the plaquette made by two joint triangles (i.e. {vi,V 2 ,V 3 } and 
{^ 2 ,^ 3 , Vi} in Figj^. The gain function for a Ti move is associated to the removal of an edge 
(i.e. (ui,U 3 ) in Figj^ and the simultaneous addition of another edge (i.e. (u 2 ,U 4 ) in Figj^. 
Similarly the gain for a move of type A (as shown in Figj^ results from the removal of one edge 
and the insertion of a new vertex and four new edges. The use of Ti and A moves generally 
improve gain; however, we have verified that the algorithm with T 2 only produces very similar 
results. Furthermore, planar filtered graph with Ti or A moves are no longer clique trees but 
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rather bubble-trees which are in general no longer chordal. For instance see Figj^ where the 
application of Ti creates a non-chordal graph: the cycle vi — V 2 — vg — V 5 — vi has length grater 
than 3 without internal chords. This can have some implications for dependency modeling, as 
we shall discuss in Section |IV[ In the following we will therefore consider separately the two 
cases of TMFG constructed with and without Ti and A. In many cases the application of the 
swap operator S results in higher overall gains. This operator has the advantage of leaving the 
overall topology unchanged but its use should be regulated by few local or heuristic criteria to 
avoid an increase in the complexity of the algorithm due to the increasing number of possible 
combinations. The case that we have implemented requires the evaluation of all the possible 
combinations of the four vertices involved in the execution of a T 2 operation. This requires some 
further changes to the cache vectors, but - being applied locally - it does not increase the overall 
computational complexity that remains 0 {p^). 

The TMFG algorithm is not greedy with respect to edge insertion in the sense that the best 
possible move is chosen from a subset of all the feasible edge insertions that preserve planarity. 
Nonetheless, we shall see that TMFG performs as well as - or better than - the PMFG for a 
large class of weight matrices, including squared correlation coefficient matrices from empirical 
time series which are relevant for modeling |42j . 


B. Dynamical adaptability 

Due to the local nature of the operators, Ti, T 2 , A, A~^ and (local) S, used to construct 
the TMFG one can continuously modify the network allowing ‘online’ adaptability while new data 
are generated. This is of practical importance because in real, big data, applications information 
is changing dynamically with new data continuously fed causing changes in the matrix of weights 
that require modifications of the filtered graph. Further, creation of new nodes is required 
when new elements/variables become relevant in the system. Gonversely, elements/variables 
can eventually become irrelevant and the corresponding vertices should be eliminated from the 
graph. The implementation of these moves requires keeping a cache matrix of gains continuously 
updated and dynamically checking for moves that improve total gains. 


C. Parallelization and big data 

The local nature of TMFG construction and dynamical adaptation through Ti, T 2 , 

A, A~^ and (local) S, moves make it ideal for parallelization. There are several possibilities for 
parallelization and it is beyond the purpose of the present work to implement a parallel algorithm 
for TMFG. Let us however discuss briefly a possible parallel implementation of the TMFG. One 
of the main features of planar triangulations is that three-cliques uniquely divide the network into 
two ‘inside’ and ‘outside’ subgraphs within a nested hierarchical structure [43) . This means that, 
given a seed structure of three-cliques, each clique can develop its inside subgraph independently. 
A processor can be assigned to each seed clique and calculations can be performed locally. Given 
that each separating clique divides roughly the graph into two parts one can compute the TMFG 
in 0{p) using O(logp) processors. Another issue related to big data is the size of the score 
vectors. It is clear from the construction that the size of the cache grows linearly with the 
dimension of the problem and that triangles in the basis can be assigned to different processors, 
allowing parallel updates of the cache. 
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D. Memory usage 

In the case of pair-wise dependence (such as correlation) both the deltahedron heuristic and 
the PMFG require to compute in advance the entire correlation matrix, while the TMFG does 
not use the full information from the correlation matrix and could calculate only the correlations 
necessary for the incremental update of the gain vectors. This is an advantage already for 
correlation measures, in the (numerous) cases where the number of observations (q) is less than 
the number of variables (p): in fact it could require much less memory (approximately p x q) to 
store the time series of the observations and calculate the correlations on-demand, rather than 
calculating and storing a large correlation matrix (approximately ). This fact is even 

more relevant for multi-point dependencies (e.g. partial correlation, mutual information, ...): in 
these cases the TMFG would still require only to store the time series in memory and would 
require the calculation of the relevant gain functions only, while other methods would require 
the storage of large amounts of data (e.g. order of p^ for a three-points dependency measure). 


IV. MODELING WITH TMFG: INFORMATION THEORETIC PERSPECTIVE 


In complex systems, such as financial markets, a large number of interdependent variables are 
typically involved. The TMFG is a way of filtering the structure of interrelation between the 
variables reducing it to a network of most relevant interactions. 

Modeling the system statistically consists in identifying the joint probability distribution that 
best describes the observed collective behavior of the variables. 

Specifically, given a set of observations {xi(I),..., a;i(g)}, {x 2 {i), ■■■,X 2 {q)}, 

{xp{l), ...,Xp{q)} of p random variables X = {Xi, X2, ■■■, Xp}, one aims to estimate a 
joint probability distribution function (5(X) that is the best representation of the ‘true’ 
multivariate probability distribution function T’(X) from which the set of observations are 
drawn. Glearly, T’(X) is unknown and the only information available are the observations 
{a:i(I),..., a:i(( 7 )}, {x 2 (l),..., 2 : 2 ( 9 )}, ... { 2 :^( 1 ),..., 2 :p( 9 )} from which Q(X.) must be estimated. 

Information filtering graphs can be used to compute (3(X). The main advantage is that these 
graphs are locally low dimensional (e.g. the largest clique is K 4 when planarity is enforced) which 
makes sampling tractable also with limited amount of data |42]. Further, the TMFG constructed 
from T 2 moves results in a tree made of 4-cliques separated by 3-cliques (called separators in 
the following). This is a particular case of a triangulated or chordal or decomposable graph |22j . 
From the theory of graphical models (see [18]) we know that the probability distribution function 
associated to a decomposable graph - such as the TMFG - admits the representation: 


Q(X) 


Y\cGCliques 0c(Xc) 
rise Separators M^s) 


(3) 


Where (pciXc) and (f>s0^a) are the marginal probabilities of the sub sets of variables associated 
respectively with the 4-clique Ac and separator Xg. 

Eqj^ reduces the p-dimensional problem of estimating the joint probability distribution func¬ 
tion (5(X) to the estimation of a set of 3- and 4-dimensional local marginal probabilities (()s(Xs) 
and (j)ciXc). Such a dimensionally reduction helps greatly in the estimation of the joint prob¬ 
ability. The open question is now to measure how well the, unknown, true joint distribution 
P(X) is represented by the model estimation Q(X) factorized over the TMFG. To this end we 
can measure the dissimilarity between the two probability density functions which is given by 
the Kullback-Leibler divergence [44] : 

o*MPiio)=i;f’(x)iog(®) 


(4) 
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For simplicity we are considering discrete variables, a similar treatment can be developed for 
continuous variables. The goal is to construct the TMFG that minimizes such a distance. By 
substituting Eqj^ into Eq|^ we obtain 

DKL{P\\Q)=J2P(^)^^s{Pi^)) ( 5 ) 

X 

- E c) log ((/)c(Xc)) 

cGCliques Xc 

+ E s)log(0s(X^)) . 

s^Separators X^ 

From a information theoretic perspective the first term in Eqj^ (with a minus sign), 

if = -5]P(X)log(P(X)) , (6) 

X 

quantifies the total amount of uncertainty in the system, measuring the number of bytes (if 
base-2 logarithms are used) necessary to define a state. The other two terms in Eqj^ 

cGCliques Xc s^Separators X^ 

also quantify an uncertainty, but in this case, associated with the model of the system. In 
other words, by adopting the TMFG structure of interactions, Hm measures the number of 
bytes necessary to define the state of the system when only the interrelations among variables 
associated with edges in the TMFG are considered. 

An algorithm to construct the TMFG with the aim of minimizing Dxl{P II Q) can be imple¬ 
mented by choosing at every stage the move that minimally increases consistently with all 
other constraints. In particular, considering the TMFG construction via T 2 moves, the contri¬ 
bution to Dkl{P II Q) from the insertion of a vertex v added inside an existing triangular face 
t generating a 4-clique u is: 

A(u,t)=^(^,(X01og((/>„(X0)-^,^*(Xt)log(.^t(Xt)) . (8) 

x„ Xt 

From an information theoretic perspective —5 is the amount of uncertainty introduced in the 
model by including a variable v, the TMFG structure should be constructed in a way to minimize 
such uncertainty. In |42j the case for normal multivariate distributions is discussed in details. 


V. EXAMPLES OF TMFG CONSTRUCTION AND COMPARISON WITH PMFG 

A vast literature has demonstrated that PMFG can be used to retrieve meaningful information 
about the structure of interdependency in complex datasets [H IMI^ , it is therefore natural to 
compare the performances of the TMFG with the ones of the PMFG. 

Let us first look at the scaling of execution times for TMFG and PMFG algorithms as function 
of the size p of the weight matrix W; results are reported in Figj^ (seconds on a 2.6 GHz Intel 
Gore i7®). We observe that TMFG execution times scale with the matrix dimension size p 
approximately as 0{p^) while PMFG scales approximately as 0{p^). The 2-parameters best 
polynomial fits give respectively: Ttmfg 2 • 10“^ • -I- 6 • 10“"^^ and Tpmfg ~ 2 • 10“® • p^ + 

3 • Overall we can see that execution times are several orders of magnitude faster for 

TMFG than PMFG. 

We have then compared the total retained edge weight for the following four variants of the 
TMFG construction: 
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FIG. 7: Demonstration that TMFG is faster and scalable with respect to the PMFG. Comparison 
between execution times for TMFG and PMFG for different values of p ranging between 50 and 10000. 
Lines are the 2-parameters best polynomial fits (see text). 

1. TMFG: the base version of the algorithm. It uses only T 2 operators. This version of the 
algorithm produces chordal graphs. 

2. TMFG-Tl: uses T 2 followed by an optimisation stage where a number of Ti moves are 
performed after every insertion of a new vertex. 

3. TMFG-S: a variant of the basic algorithm with T 2 followed by local optimization with S. 

4. TMFG-A: a variant of the algorithm with T 2 followed by local optimization with Ti and 
A. 

We have tested 9 types of random weight matrices W with different weight distributions: 

1. Beta distribution with shape parameters a = 0.5 and /3 = 3. This distribution is heavily 
skewed and is characterised by very low density on the right side of the interval [0,1]. 

2. Beta distribution with shape parameters a = 3 and /3 = 0.5. This distribution is skewed in 
the opposite direction and has a high density near the right extreme of the interval [0,1]. 

3. Pareto distribution with power law exponent equal to 1. This distribution has a fat tail. 

4. Pareto distribution with power law exponent equal to 2. This distribution has still a fat 
tail, but thinner than the previous one. 
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Weight matrix 

coefficients 

distribution 

TMFG/ 

PMFG 

TMFG-Tl/ 

PMFG 

TMFG-S/ 

PMFG 

TMFG-A/ 

PMFG 

TMFG (Time)/ 

PMFG (Time) 

Beta(0.5,3) 

95.42% 

96.24% 

95.72% 

99.89% 

0.16% 

Beta(3, 0.5) 

104.70% 

104.73% 

104.77% 

104.80% 

0.14% 

Pareto(l) 

99.97% 

99.97% 

99.97% 

99.97% 

0.17% 

Pareto(2) 

97.94% 

98.00% 

98.02% 

98.32% 

0.17% 

Random Matrix 

(20 factors) 

102.23% 

102.63% 

102.57% 

103.77% 

0.22% 

Random Matrix 

(50 factors) 

100.30% 

100.82% 

100.64% 

102.54% 

0.21% 

Random Matrix 

(100 factors) 

98.46% 

99.14% 

98.86% 

101.42 % 

0.21% 

Uniform 

116.27% 

116.29% 

116.34% 

116.89% 

0.15% 

Real correlation 

matrix 

100.11% 

100.17% 

100.24% 

100.42% 

0.15% 


TABLE I: Average relative performances (ratio between sum of edge weights) of the TMFG algorithm 
with respect to the PMFG. Four TMFG variants and nine different weight distributions. Note that 
TMFG and TMFG-S are chordal graphs. 


5. Random matrix of correlations of 400 time series generated by simulating 20 normally 
distributed common factors. 

6. Random matrix of correlations of 400 time series generated by simulating 50 common 
factors. This matrix shows less structure than the one generated using 20 factors. 

7. Random matrix of correlations of 400 time series generated by simulating 100 common 
factors. This matrix shows less structure than the two above. 

8. Uniform distribution over [0,1]. 

9. Square of a real correlation matrix coefficients computed from daily log-returns of 342 US 
stocks, across a period of 15 years (form Jan 1997 to Jul 2012) (see [TU]). 

All matrices are symmetric and have size p = 400 except the real correlation data that have 
sizes p = 342. For all the weight matrices (excepting for the real correlation matrix) we have 
compared results for 100 samples. For the real correlation matrices we generated matrices by 
random sampling the starting point of 100 time windows of length 1000 data points over a period 
of 4500 points in total. Table reports the average relative performance, defined as the ration 
between the sum of the edge weight in the the four variants of TMFG with respect to the sum 
of the edge weight in the PMFG. It shows that the PMFG is usually more effective when the 
density of high weights is low, while the TMFG is more effective when the density of high weights 
is higher or limited. This result is to be expected since the PMFG is less constrained than the 
TMFG in picking up isolated high-weight edges one at a time, while the TMFG is more efficient 
in selecting subsets of edges with a high total sum. For the random matrices of correlation we 
see that the TMFG performs better than the PMFG in filtering the more structured matrix 
generated using 20 factors. In the real case we see that the TMFG is marginally better than 
the PMFG. We conclude that TMFG is in general performing comparably well, and sometimes 
better than the PMFG.We observe that the TMFG tends to improve relative performance as 
the size of the matrix increases (see Table E- When other moves are used TMFG improves 
performances with best performances obtained by the TMFG-A variant. 
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Weight matrix size 

P 

TMFG 

/PMFG 

TMFG-Tl 

/PMFG 

TMFG-S 

/PMFG 

TMFG-A 

/PMFG 

50 

88.68% 

88.82% 

89.35% 

95.93% 

100 

90.49% 

93.13% 

92.31% 

98.14% 

150 

92.14% 

93.77% 

90.44% 

95.26% 

300 

93.73% 

95.6% 

94.63% 

100.06% 

500 

96.36% 

96.79% 

96.6% 

100.98% 

700 

98.83% 

100.49% 

98.92% 

103.58% 

850 

98.93% 

99.95% 

99.99% 

103.83% 

1000 

100.33% 

100.56% 

100.71% 

105.39% 

1200 

101.34% 

102.16% 

101.16% 

105.24% 


TABLE II: Example of relative increase in performance of the TMFG algorithm with respect to PMFG 
when dimensionality p increases. The underlying distribution is a Beta(0.5, 3). 


VI. CONCLUSIONS 

We have described a new family of algorithms, TMFG, to retrieve approximate solutions to 
the Maximal Planar Graph problem, which have the following desirable characteristics: 

1. TMFG is faster than PMFG with execution times increase as the square of the weight 
matrix size p, while PMFG increases with the third power of p. 

2. TMFG constructed with T 2 and S only produces chordal graphs. This opens the door 
to the use of Markov Random Fields as a modelling tool. The fact that the maximum 
dimension of cliques is controlled by the topology entails that efficient inference algorithms 
can be used. 

3. From TMFG construction the structure of cliques and separators is automatically retrieved. 

4. The local nature of the TMFG construction allows one to model dependence using gen¬ 
uinely multivariate score functions over the clique elements such as Mutual Information, 
Total Gorrelation, Partial Correlation, Likelihood functions etc. (as opposed to bivariate 
functions such as correlation). 

5. TMFG does not require preliminary calculation and sorting of all the values of the score 
function. In cases when the distribution is based on 3 or more variables this constraint is 
severe. 

6. In cases where p 3> g the memory footprint of the algorithm can be reduced by keeping the 
p X q data points in memory and calculating the dependence function on-demand, instead 
of the p^ values of a correlation matrix or the p‘^ values of a d—variate dependence function. 

7. The use of local and non-local operators {Ti,T 2 , A, S) allows one to change the topology 
of the network in an easy and controlled way as the network evolves. 

8. The geometric formulation of the algorithm allows one to consider generalisation higher 
genus to simplicial complexes structures. 

9. The limited treewidth of the filtered network makes it amenable to efficient (and often 
exact) inference [TO] . 

Future developments of the TMFG method are in the direction of building networks with a 
richer structure beyond and also below planarity while keeping a controlled complexity. We 
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intend to apply this filtered networks to sparse modeling with application to physical, financial 
and biological systems. We intend to develop applications of this hltered network construction 
and efficient inference and use this tool to identify large scale features of financial networks in 
different regimes and to apply the inference algorithms to a number of problems in financial risk 
management, such as risk aggregation and allocation, simulation, stress testing and incorporation 
of non-homogeneous sources of information (such as asset prices, macroeconomic variables, and 
also expert opinion) into a risk management framework. 
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